AITopics | dominant subspace

Collaborating Authors

dominant subspace

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d4bad256c73a6b25b86cc9c1a77255b1-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 08:11:48 GMT

algorithm, dominant subspace, matrix, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Halland County > Halmstad (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation

Neural Information Processing SystemsAug-17-2025, 14:26:54 GMT

Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchronisation. Although sparsity is a desirable property in such settings, it is mostly neglected in spectral formulations since existing solvers, e.g. based on eigenvalue decomposition, are unable to account for sparsity while at the same time maintaining global optimality guarantees.

artificial intelligence, machine learning, optimization problem, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Halland County > Halmstad (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Accelerating Neural Network Training Along Sharp and Flat Directions

Zakarin, Daniyar, Singh, Sidak Pal

arXiv.org Machine LearningMay-20-2025

Recent work has highlighted a surprising alignment between gradients and the top eigenspace of the Hessian -- termed the Dominant subspace -- during neural network training. Concurrently, there has been growing interest in the distinct roles of sharp and flat directions in the Hessian spectrum. In this work, we study Bulk-SGD, a variant of SGD that restricts updates to the orthogonal complement of the Dominant subspace. Through ablation studies, we characterize the stability properties of Bulk-SGD and identify critical hyperparameters that govern its behavior. We show that updates along the Bulk subspace, corresponding to flatter directions in the loss landscape, can accelerate convergence but may compromise stability. To balance these effects, we introduce interpolated gradient methods that unify SGD, Dom-SGD, and Bulk-SGD. Finally, we empirically connect this subspace decomposition to the Generalized Gauss-Newton and Functional Hessian terms, showing that curvature energy is largely concentrated in the Dominant subspace. Our findings suggest a principled approach to designing curvature-aware optimizers.

artificial intelligence, machine learning, subspace, (17 more...)

arXiv.org Machine Learning

2505.11972

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

I3S: Importance Sampling Subspace Selection for Low-Rank Optimization in LLM Pretraining

Zhang, Haochen, Yin, Junze, Wang, Guanchu, Liu, Zirui, Zhang, Tianyi, Shrivastava, Anshumali, Yang, Lin, Braverman, Vladimir

arXiv.org Artificial IntelligenceFeb-9-2025

Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memory cost of storing optimizer states. A key challenge in these methods is identifying suitable subspaces to ensure an effective optimization trajectory. Most existing approaches select the dominant subspace to preserve gradient information, as this intuitively provides the best approximation. However, we find that in practice, the dominant subspace stops changing during pretraining, thereby constraining weight updates to similar subspaces. In this paper, we propose importance sampling subspace selection (I3S) for low-rank optimization, which theoretically offers a comparable convergence rate to the dominant subspace approach. Empirically, we demonstrate that I3S significantly outperforms previous methods in LLM pretraining tasks.

artificial intelligence, large language model, natural language, (10 more...)

arXiv.org Artificial Intelligence

2502.0579

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Virginia (0.04)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Does SGD really happen in tiny subspaces?

Song, Minhak, Ahn, Kwangjun, Yun, Chulhee

arXiv.org Machine LearningMay-24-2024

Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a low-rank top eigenspace of the training loss Hessian, referred to as the dominant subspace. Given this alignment, this paper explores whether neural networks can be trained within the dominant subspace, which, if feasible, could lead to more efficient training methods. Our primary observation is that when the SGD update is projected onto the dominant subspace, the training loss does not decrease further. This suggests that the observed alignment between the gradient and the dominant subspace is spurious. Surprisingly, projecting out the dominant subspace proves to be just as effective as the original update, despite removing the majority of the original update component. Similar observations are made for the large learning rate regime (also known as Edge of Stability) and Sharpness-Aware Minimization. We discuss the main causes and implications of this spurious alignment, shedding light on the intricate dynamics of neural network training.

dominant subspace, international conference, subspace, (14 more...)

arXiv.org Machine Learning

2405.16002

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals

Arabi, Madi, Fang, Xiaolei

arXiv.org Machine LearningNov-13-2023

Industrial prognostic aims to predict the failure time of machines by utilizing their degradation signals. This is typically achieved by establishing a statistical learning model that maps the degradation signals of machines to their time-to-failure (TTFs) [1, 2]. Similar to that of many other statistical learning models, the implementation of prognostic models usually consists of two steps: model training and real-time monitoring (also known as model testing or deployment). Model training focuses on using a historical dataset that comprises the degradation signals and TTFs of some failed machines to estimate the parameters of the prognostic model; real-time monitoring feeds the real-time degradation signals from a partially degraded onsite machine into the prognostic model trained earlier to predict its TTF or TTF distribution. Most existing prognostic models assume that a historical dataset from a decent number of failed machines is available for model training [3, 4, 5, 6, 7]. In reality, however, the amount of historical data owned by a single organization (e.g., a company, a university lab, a factory, etc.) might be small or not large enough to train a reliable prognostic model.

degradation signal, machine learning, real time system, (16 more...)

arXiv.org Machine Learning

2311.07474

Country:

North America > United States > North Carolina (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation

Bernard, Florian, Cremers, Daniel, Thunberg, Johan

arXiv.org Machine LearningSep-30-2021

We address the non-convex optimisation problem of finding a sparse matrix on the Stiefel manifold (matrices with mutually orthogonal columns of unit length) that maximises (or minimises) a quadratic objective function. Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchronisation. Although sparsity is a desirable property in such settings, it is mostly neglected in spectral formulations since existing solvers, e.g. based on eigenvalue decomposition, are unable to account for sparsity while at the same time maintaining global optimality guarantees. We fill this gap and propose a simple yet effective sparsity-promoting modification of the Orthogonal Iteration algorithm for finding the dominant eigenspace of a matrix. By doing so, we can guarantee that our method finds a Stiefel matrix that is globally optimal with respect to the quadratic objective function, while in addition being sparse. As a motivating application we consider the task of permutation synchronisation, which can be understood as a constrained clustering problem that has particular relevance for matching multiple images or 3D shapes in computer vision, computer graphics, and beyond. We demonstrate that the proposed approach outperforms previous methods in this domain.

algorithm, dominant subspace, matrix, (11 more...)

arXiv.org Machine Learning

2110.00053

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Network topology change-point detection from graph signals with prior spectral signatures

Kaushik, Chiraag, Roddenberry, T. Mitchell, Segarra, Santiago

arXiv.org Machine LearningOct-21-2020

We consider the problem of sequential graph topology change-point detection from graph signals. We assume that signals on the nodes of the graph are regularized by the underlying graph structure via a graph filtering model, which we then leverage to distill the graph topology change-point detection problem to a subspace detection problem. We demonstrate how prior information on the spectral signature of the post-change graph can be incorporated to implicitly denoise the observed sequential data, thus leading to a natural CUSUM-based algorithm for change-point detection. Numerical experiments illustrate the performance of our proposed approach, particularly underscoring the benefits of (potentially noisy) prior information.

artificial intelligence, social media, subspace, (18 more...)

arXiv.org Machine Learning

2010.11345

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Social Media (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Communications > Networks (0.65)

Add feedback